Phonetic segmentation using multiple speech features

نویسندگان

  • Iosif Mporas
  • Todor Ganchev
  • Nikos Fakotakis
چکیده

In this paper we propose a method for improving the performance of the segmentation of speech waveforms to phonetic segments. The proposed method is based on the well known Viterbi timealignment algorithm and utilizes the phonetic boundary predictions from multiple speech parameterization techniques. Specifically, we utilize the best, with respect to boundary type, phone transition position prediction as initial point to start Viterbi time-alignment, for the prediction of the successor phonetic boundary. The method was evaluated on the TIMIT database, with the exploitation of several, well known in the area of speech processing, Fourier-based and wavelet-based speech parameterization algorithms. The results for the tolerance of 20 milliseconds indicated an improvement of the absolute segmentation accuracy by approximately 0.70%, when compared to the baseline speech segmentation scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), is the inferior acoustic modeling of low level or phonetic level linguistic information in the speech...

متن کامل

Acoustic cues identifying phonetic transitions for speech segmentation

The quality of corpus-based text-to-speech (TTS) systems depends strongly on the consistency of boundary placements during phonetic alignments. Expert human transcribers use visually represented acoustic cues in order to consistently place boundaries at phonetic transitions according to a set of conventions. We present some features commonly (and informally) used as aid when performing manual s...

متن کامل

Trying to mimic human segmentation of speech using HMM and fuzzy logic post-correction rules

The process of human segmentation and labelling of speech can be seen as a two-step process. In the first step humans listen to a speech signal, recognize the word and phoneme sequence, and roughly determine the position of each phonetic boundary. In the second step humans examine several speech signal features (waveform, energy, spectrogram, etc.) to place a phonetic boundary time mark where t...

متن کامل

Using Fuzzy Logic and Features Measured from the Time Domain to Achieve Smart Separation of Phonetic Units

The segmentation of uttered speech into phonetic units is a key processing task for successfully implementing speech recognition systems. This paper presents a smart approach to phonetic segmentation of uttered speech that separates vowels from consonants. Time-domain feature-extraction algorithms are applied to speech to extract features at minimum computational cost. Fuzzy decision logic is u...

متن کامل

An Automatic Syllable Segmentation Method for Mandarin Speech

An automatic syllable segmentation method for mandarin speech is proposed. There are five features and the corresponding phonetic transcriptions used in the method. Firstly, the speech signals are pre-filtered. Secondly, the speech signal pre-filtered is segmented into 30 ms long segments and the five features of each segment are computed. Finally, syllable segmentation performs based on the ph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • I. J. Speech Technology

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2008